The Ising Model for Neural Data: Model Quality and Approximate Methods for 

Extracting Functional Connectivity 
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We study pairwise Ising models for describing the statistics of multi-neuron spike trains, using 
data from a simulated cortical network. We explore efficient ways of finding the optimal couplings in 
these models and examine their statistical properties. To do this, we extract the optimal couplings 
for subsets of size up to 200 neurons, essentially exactly, using Boltzmann learning. We then study 
the quality of several approximate methods for finding the couplings by comparing their results 
with those found from Boltzmann learning. Two of these methods - inversion of the TAP equations 
and an approximation proposed by Sessak and Monasson - are remarkably accurate. Using these 
approximations for larger subsets of neurons, we find that extracting couplings using data from a 
subset smaller than the full network tends systematically to overestimate their magnitude. This 
effect is described qualitatively by infinite-range spin glass theory for the normal phase. We also 
show that a globally-correlated input to the neurons in the network lead to a small increase in 
the average coupling. However, the pair-to-pair variation of the couplings is much larger than this 
and reflects intrinsic properties of the network. Finally, we study the quality of these models by 
comparing their entropies with that of the data. We find that they perform well for small subsets 
of the neurons in the network, but the fit quality starts to deteriorate as the subset size grows, 
signalling the need to include higher order correlations to describe the statistics of large networks. 

PACS numbers: 87.85.dq,87.18.Sn,87.19.L- 



I. INTRODUCTION 

Computation in the brain is performed by large pop- 
ulations of neurons. Because these neurons are highly- 
connected, and because the external inputs that they re- 
ceive is usually correlated, the neuronal spike trains arc 
also correlated. These correlations depend on neuronal 
properties, synaptic connectivity and the external drive 
in a highly nontrivial way. 

The large number of neurons involved in any compu- 
tation, and the fact that they arc correlated, make de- 
ciphering the mechanisms of neural computations a dif- 
ficult challenge. A major technical breakthrough in this 
challenge has been the advent of techniques for recording 
simultaneously from large numbers of neurons. Yet, mak- 
ing a link between these recordings and an understand- 
ing of the computations is nontrivial and requires new 
mathematical approaches. This is because, in most cases, 
using the observed data to answer questions about com- 
putation requires building statistical models, i.e. writing 
down the probability distribution over spike patterns [l| . 
However, the high dimensionality of the space of possible 
spike patterns makes it very hard to collect enough data 
to build an exact statistical description of them. 

One approach to circumvent the problem of high di- 
mensionality of the space of spike patterns is to use para- 
metric models. In this approach, one uses the data to fit 
a parametric probability with a much smaller number of 
parameters than the dimensionality of the space of spike 
patterns. To use any such parametric model reliably, one 
needs to answer two questions: How can we fit the pa- 
rameters of the model efficiently, and how close is the 



model to the true probability distribution? 

In this paper, we try to provide answers to these ques- 
tions for the case of the maximum entropy binary pair- 
wise model, the Ising model, using data from a simulated 
network of spiking neurons. The Ising model has received 
a lot of attention as a parametric model for neural data 
following the study by Schncidman ei aZ These au- 
thors modeled the true distribution of the spike patterns 
by the Gibbs distribution of an Ising model: 



Pising(s) oc cxp ■ 



(1) 



where s = (si, S2, . . . , sat), and each spin Si = ±1 rep- 
resents the firing or not firing of neuron i. The external 
fields, hi, and the coupling parameters, J^, were fit so 
that the resulting distribution had the same means and 
pairwise correlations as the data, that is. 
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where ()ising represents averaging with respect to the 
Ising model distribution ([T|), and ()data represents aver- 
ages computed from the data. The couplings inferred in 
this way can be thought of as some sort of functional 
couplings between the neurons |3| • 

The experimental studies by Schneidman et al and oth- 
ers d, Q showed that the pairwise Ising model provided 
good approximations to the true distributions for sets of 
up to 10 neurons with small probability of spiking. These 
experimental studies were followed by a theoretical anal- 
ysis by Roudi et al 0] in which the authors studied the 
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quality of pairwise models using a perturbative expansion 
in NvSt, where N is the population size, V is the mean 
population firing rate, and 5t is the time bin chosen for 
binning the data. They showed that pairwise models al- 
ways provide a good statistical description of spikes as 
long as N is small compared to Nc = {p6t)^^ . That is, a 
pairwise model is almost guaranteed to give a good result 
for any N if the average firing rate is low enough and/or 
the time bins are small enough such that N <^ Nc- This 
may be taken as a good news, but there is a flip side to 
it: finding that the Ising distribution is a good model for 
N <^ Nc does not tell us whether it is going to be a good 
model in the large N limit. 

In the first part of this paper, we address the first 
question mentioned above, i.e., finding the parameters 
of the Ising model given the measured means and corre- 
lations: the "inverse Ising problem." Here, we study var- 
ious fast approximations for extracting the couplings of 
the Ising model and compare their results with the com- 
monly used, usually slow, but potentially exact, Boltz- 
mann learning algorithm [7[ . We show that the couplings 
found using the inversion of the TAP equations 
and the approximation recently proposed by Sessak and 
Monasson [lO| do a very good job in approximating true 
couplings. Furthermore, the inversion of the TAP equa- 
tions leads to overestimating the couplings, while the Ses- 
sak and Monasson approximation leads to underestimat- 
ing them, and we note that simply averaging them gives 
an even better result. We also study the dependence of 
the inferred couplings on the size of the subset of neu- 
rons for which the couplings are extracted. We find that 
the mean and standard deviation of the couplings exhibit 
size dependences compatible with those predicted for an 
infinite-range spin glass model (the SK model [ll| ) when 
it is in its normal phase. We also show that these de- 
pendences are mainly caused by scaling of the individual 
couplings rather than restructuring the couplings. 

We performed all this analysis for two sets of data. For 
one set, which we describe as "tonic firing" , the neurons 
in the network fired at constant rates. For the other set, 
which we call "stimulus-driven" , the external input to the 
network was varied temporally, evoking a modulation of 
the firing of all the neurons in the network and, thus, 
additional "stimulus- induced" correlations. Our findings 
arc nearly same for the two data sets, so in most of the 
following we show results only for the tonic case. The 
only systematic difference between the results in the two 
cases is a small increase in the mean of the inferred cou- 
plings in the stimulus-driven described in section 

El 

All studies to date on the quality of the pairwise models 
have been carried out for sets of neurons smaller than Nc 
by factors of 2 — 3 (in the regime where a good pairwise fit 
is trivial Q). In section |Vl] we test the quality of pairwise 
models for set sizes above A^c, using our data. We find, 
as predicted in Q, that the fit quality deteriorates as 
N increases and this continues to be the case even for 
N>Nc. 



II. SIMULATION DATA 

We obtained our data from a simulated model corti- 
cal network of 1000 spiking neurons, 80% of them excita- 
tory and 20% inhibitory, operating in a high-conductance 
state [III of balanced excitation and inhibition p^ . 
There is a general consensus that cortical networks op- 
erate in such a state. The connectivity in the network 
was random, with a 10% probability of connection be- 
tween any two neurons. The model is fairly realistic: 
its neurons have Hodgkin-Huxlcy spike generating dy- 
namics, and its synapses are modeled as conducting ion 
channels which are opened for short times by presynap- 
tic spikes. Its membrane potential and firing statistics 
are in good agreement with in vivo measurements on lo- 
cal cortical networks. The details of the simulations are 
described in Appendix VK\ 

Spike trains generated from the simulated network 
were divided into bins of length 10 ms. To each bin we 
then assigned a vector of spin variables s = (si, . . . , sjv) 
in which Si ~ —1 if neuron i did not emit a spike in 
that bin, and Si ~ 1 otherwise [1, d, We then com- 
pute the mean magnetization and pairwise correlations of 
these spin variables and use them to fit the Ising model 
that generates the same mean and pairwise correlations. 
In the analysis reported here, we only studied excitatory 
cells with mean magnetization larger than —0.98 (i.e., fir- 
ing rates greater than 1 Hz). We did this for two reasons. 
First, the estimation of the means and, importantly, cor- 
relations for cells with very small firing probabilities is 
very inaccurate. Second, fitting these small numbers us- 
ing essentially any method is also inaccurate. 



III. APPROXIMATE AND EXACT SOLUTIONS 
TO THE INVERSE ISING PROBLEM 

The simplest method for finding the fields and cou- 
plings such that Eqs. ([2]) are satisfied is Boltzmann learn- 
ing [7|. This is an iterative algorithm in which, at each 
step, the fields and the couplings are adjusted as follows: 
starting from some initial guess for the parameters, one 
computes the means and the pair correlations under the 
Ising distribution using the current values of the param- 
eters. One then makes changes 5hi and 5Jij in the pa- 
rameters according to 

5hi = 77 {(S,)data - (Si)lsing} (3a) 
SJij = 77 {(SiSj)data - (SiSj) Ising} • (3b) 

One then recomputes the model means and correlations 
using the new parameters, makes new parameter changes, 
and so forth until the model statistics agree with the 
measured ones within the desired accuracy. 

The averages over the Ising distribution can be done ei- 
ther by exact summation for small N, or by Monte Carlo 
sampling. In principle, the Boltzmann learning is exact in 
the sense that it is guaranteed to converge to the the cor- 
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rect fields and couplings after sufficiently many minimiza- 
tion steps, and for sufficiently many Monte Carlo steps 
per minimization step. Although Boltzman learning is 
in principle exact, it is usually a slow algorithm. This 
is particularly true for large A^, for which one needs to 
run very long Monte Carlo sampling steps per minimiza- 
tion step. It is therefore useful to have easily-computed 
approximate solutions, cither to use directly or as ini- 
tial conditions for Boltzmann learning. In this section, 
we compare the couplings obtained from four fast ap- 
proximation methods with those obtained from Boltz- 
mann learning. The four approximations are (a) naive 
mean-field theory, (b) an independent-pair approxima- 
tion, (c) a combination of (a) and (b) introduced recently 
by Sessak and Monasson (SM) and (d) inversion 

of the Thoulcss- Anderson-Palmer (TAP) equations [1, Q 
of spin-glass theory. These four approximations are de- 
scribed below. 



A. Naive Mean-Field Theory 

A naive mean-field theory (nMF) estimate can be de- 
rived simply by differentiating the mean field equations 
for the magnetizations with respect to the fields and using 
the fluctuation-response relationship. Using the notation 
rrii = {si) and Cij = (siSj) — niimj, this yields 



a. 



drrii d , ,, v-^ ^ , 



kj 



Equivalently, one can write 

jnMF ^ p-1 _ (--1 



(4) 



(5) 



where P,, 



(1 



B. Independent-Pair Approximation 

One simple approximation is obtained by treating ev- 
ery pair of neurons as if they were independent of the 
the rest of the system. Consider two spins, i and j, and 
let us denote the field on spin i (j) in the absence of the 
bond Jij between them by ft,^ (f^j)- We can then write the 
probability distribution over the states of this two-spin 
system as 



(6) 



where Zij is the partition function of this two-spin sys- 
tem. In writing the above equation, we assume that the 
state of spin i will not have any effect on and vice 
versa. (A sufficient condition for this to hold is that the 
system is on a Bethc lattice.) 



Eqs.[6]can be solved for J^, 



1 



j.r = I log 



P++P— 
P+-P-+ 



(7) 



where p++ is the probability that both spins are up, 

when the first spin is up the second one is down etc. 
Expressing the probabilities in terms of the means and 
correlations, one gets 



(1 + m, + mj + C*-)(l -rui- m.j + C*-) 
(1 - 771, + vfi^ - C* )(1 + - m.j - C* ) 



(8) 



where C*^ = Cy + miirij. 

In the low rate limit, 771^ - 
above expression simplifies to 



1 



1 



-1 and 



Cij 



(1 + m,){l + nii) 



-1, the 



(9) 



which is identical to the result derived by perturbative 
expansion in NvSt in Q which we simply call the low 
rate expansion. The fact that this low rate expansion 
gives identical results to the limiting case of independent 
pair approximation is expected since for sufficiently low 
rates, the contribution of feedback loops to the local field 
on each site can be neglected. Consequently, one can ig- 
nore the contribution from other spins to the correlation 
function of i and j. 



C. Sessak-Monasson Approximation 

Recently Sessak and Monasson [l^ derived an expres- 
sion relating the couplings to the means and correlations 
using a perturbative expansion in the correlations. This 
was done by extending the approach proposed by Georges 
and Ycdidia [T^ : instead of performing one Legendre 
transform to fix the magnetizations, as in p^ . they per- 
formed two Legendre transforms of the free energy, one 
to fix the magnetization and the other one to fix the 
correlations. They then expanded the result in a high- 
temperature Plefka series [1^ . The authors noticed that 
some of the terms in the expansion can be summed up, 
yielding a closed form approximation for the couplings 
that takes the form 



SM 



j^loop J^P^"' 



9m (10) 

(l-m2)(l-7772)-(Q,)2' ^ 



where J^^"' is given by Eq. [8] and 

= (i,L,)-i/2 [M(l + M)-i], 



(11) 



with L( = 1 - 777f, Mij = CijiULj)-^/^ and M„; = 
0. This expression for can easily be shown to be 

equivalent to the naive mean-field solution Eq. ([5]). 
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(a) {b} 




FIG. 1: Checking the rehabihty of the Boltzmann results, (a) 
The coupUngs of a set of = 200 found using Boltzmann 
learning with 2QQ0Q learning steps versus those found after 
40000 learning steps, (b) couplings learned from half of the 
data (200000 time bins), versus those learned from all the 
data (400000 time bins). 

D. Inversion of TAP Equations 

The TAP equations are mean-field equations that re- 
late the local magnetizations, Wi, to the external fields 
and the couplings 

tanh^"'^ nii = hi + Jij'm-j — mi jfj (1 — m^)- (12) 
] j 

The right-hand side is the total internal field acting on 
spin i, including the Onsager reaction field in the last 
term. Differentiating Eq. (fT^ with respect to nij and 
using the fluctuation-response relation, one obtains 

iC-% 2{j'^^fm,m,^ (^ ^ ,) 

(13) 

Given the means and correlations, we can solve 
Eq. to find the couplings and use the results in Eq. 
(fT^ to find the external fields. This is the simplest ver- 
sion of the scheme introduced by Kappen and Rodriguez 
and Tanaka 0]. The TAP equations are exact in the 
limit of "infinite-range interactions" where the J^s have 
means and variances that scale like For arbitrary 

couplings, the TAP equations constitute the first two 
terms in the Plefka series [l5| which is a small-coupling 
(high-temperature) expansion. In principle, one can in- 
clude higher-order terms in the Plefka expansion instead 
of Eq. ([T^ . compute its derivative to find the suscepti- 
bility and use the fluctuation-response relation to relate 
it to the connected correlations functions. Here we stop 
at the level of TAP equations. 

E. Comparison between Boltzmann learning and 
the approximate solutions 

We considered 4000 sec worth of data from our simu- 
lated network (about 1 hr, which is of the order of a sta- 
ble retina recording session), binned the data into 10-ms 




I BoJtzmann . Boltzmann 

JiJ JlJ 



FIG. 2: Scatter plots comparing the solutions found from dif- 
ferent approximations with the Boltzman learning results for 
= 20. (a) Naive mean-field approximation (b) Independent 
pair approximation, (c) low rate limit, (d) TAP, (e) SM, and 
(f) a hybrid approximation obtained by averaging TAP and 
SM. 



bins, and computed the means and equal-time pairwise 
correlations of the spin representation of the data (see 
sec. ini- We first inferred the couplings of the Ising model 
using 40000 steps of Boltzmann learning with a learning 
rate of 77 = 0.1. At each step, the model means and corre- 
lations were computed on the basis of 30000 Monte Carlo 
sampling steps. We then compared these results with 
those obtained from the approximation schemes listed 
in the preceding subsection. In this comparison, we are 
assuming that the couplings inferred using Boltzmann 
learning are the correct ones and judging the approx- 
imate methods according to how well they agree with 
them. Before going on with the comparison results, we 
take a moment to justify this claim. 

The results of the Boltzmann learning may not be cor- 
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accurate. 

We now move on to the comparison of the couphngs 
found from the approximate solutions with those found 
from Boltzmann learning. The results for a set of iV = 20 
neurons is shown in Fig. [2] This figure shows that for 
this subset size, all approximations do well, although the 
Sessak-Monasson approximation outperforms the others 
by a small margin, followed closely by the TAP-inversion 
solution. For larger sets, the difference between the qual- 
ity of different approximations becomes more clear. Fig. 
[3] shows the results for TV = 200. Here the Sessak- 
Monasson approximation and TAP inversion outperform 
the rest by a significant amount. Figs. [2] and [3] also show 
that the SM and the TAP-inversion approximations dif- 
fer in the way their errors: SM tends systematically to 
underestimate the couplings, while TAP inversion over- 
estimates them. This suggests that naively averaging the 
two, i.e., summing them and dividing by two, should do 
a better job. The results of such a hybrid approximation 
shown in Figs. [2f and[3f confirm this expectation. 




0.5- 



N 



FIG. 3: The same as Fig. [2] but for iV = 200 neurons. 



rect for two reasons. First, the correlations passed to the 
Boltzmann algorithm are, in general, not the true correla- 
tions, since they are computed from finite data. Second, 
Boltzmann learning converges to the true results only in 
the limit of infinitely many learning steps, and there is 
always a chance that one has not run it long enough. To 
see how much error such effects led to in our results, we 
conducted two tests. First, we divided our spike trains of 
4000 sec into two halves and computed two sets of corre- 
lations and means: one from the first half of the data and 
other other from the whole set (this latter set is what we 
use in the subsequent analysis). Wc then scattcr-plotted 
the Jij's inferred from the first half of the data versus 
those computed from the full 4000 seconds of simulation. 
We also plotted the results found after 20000 learning 
steps against those found from 400000 steps (using cor- 
relations computed from all our data). The results are 
plotted in Fig. [1] This figure shows that within the scale 
of the errors of the various approximations (see Fig. [3]), 
the Boltzmann results can be considered to be stable and 
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FIG. 4: Quantifying the performance of different approxima- 
tion by computing (defined in Eq. I14|l and the RMS error 
(defined in Eq. llSp between the approximate solutions and 
the result of Boltzmann learning, as a function of A*'. Black 
(long dashed line), SM; Red (short dashed line), TAP; Blue 
(full curve), hybrid SM-TAP; Green (dashed dotted line), 
nMF; Magenta (dashed double dotted), low-rate approxima- 
tion; Gray (small dotted), independent-pair approximation. 

The performance of different approximations as a func- 
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FIG. 5; The A''-dependence of the mean and standard de- 
viation of the solutions found from different approximations 
and Boltzmann learning. Black (long dashed line), SM; Red 
(short dashed line), TAP; Green (dashed dotted line), uMF; 
Blue (full curve), hybrid SM-TAP; Magenta (dashed dou- 
ble dotted), low-rate approximation; Gray (small dotted), 
independent-pair approximation; Brown (large dots), Boltz- 



tion of set size is shown in a more systematic way in Fig. 
m In this figure, we computed the similarity between the 
Boltzmann solution and the approximate ones for differ- 
ent sizes, using two quantities as measures of similarity. 
The first was the coefficient of determination, , defined 
as 



approx 
ij 



J, 



Boltzmann \ 2 



E( rBoltzmann TBoltzmann\2 



(14) 



where J^J'^'^°^ refers to the couplings inferred using one 
;he al 



of the aforementioned approximations, and jBoitzmann 



indicate good approximations. 
RMS error defined as 



We also considered the 



1 



N{N - 1) 



approx 
ij 



TBoltzmann'\2 
ij ' ■ 



(15) 



1). Values of R^ close to one 



As shown in Fig. 01 TAP inversion, SM and the hybrid 
TAP-SM outperform the other approximations for all N 
and according to both measures. 

We also studied the relationship between the N- 
dependence of the couplings inferred using Boltzmann 
learning with those found from the approximate solu- 
tions. The results are shown in Fig. O The standard 
deviation of the 's is well-approximated by both TAP 
inversion and the Sessak-Monasson formula (and there- 
fore also by their average). Naive mean- field theory cap- 
tures the decrease with N quahtatively correctly but 
gives an estimate which is systematically too large. The 
independent-pair approximation fails (as does its low- 
rate limit, Eq. We will study the A^-dependence 
of the couplings more carefully in the next section. 

The means of the 's are smaller than their standard 
deviations by an order of magnitude or more. The Boltz- 
mann value is indistinguishable from zero for N > 50. Of 
all the approximations, only the SM-TAP average seems 
to approximate it well, although SM, TAP and naive MF 
estimates all show a decrease with N. 

If averaged over many samples, the independent-pair 
approximations will not exhibit any A^-dependence in 
either means and standard deviations of the couplings. 
Thus, they can not capture the observed systematic de- 
creases of these statistics with N. 



IV. SCALING OF INFERRED COUPLINGS 
AND THE COMPARISON WITH MEAN-FIELD 
SPIN GLASS BEHAVIOR 

Fig. [5] shows that the mean and standard deviation of 
the inferred couplings have some sort of scaling behaviour 
with population size and that this behaviour is well pre- 
served in the TAP, SM and hybrid TAP-SM approxima- 
tions. This A^-dependence could be either due to partial 
or complete restructuring of the couplings when more and 
more neurons are added, or simply due to the scaling of 
individual couplings. Fig. [H] and Fig. [7] show that what 
is actually happening is the latter. Fig. shows the in- 
ferred couplings between 20 neurons when no additional 
neurons are considered, while Fig.[5j3 shows the couplings 
between these neurons when they are extracted from the 
data of a set of 200 neurons. As one can see in this fig- 
ure, the main structure of the couplings of the subset is 
retained in the larger set, but they are scaled down. In 
Fig. [71 we trace how the largest ten (full blue lines) , and 
the smallest (i.e., most negative) ten (red dashed lines) 
of a set of 20 neurons change as we add more and more 
neurons to the pool. As can be seen, the small weights 
increase their values, and the large weights decrease their 
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(a) 




(b) 




FIG. 6: The couplings among 20 neurons, inferred when no 
additional neurons are considered (a), and when inferred as 
a part of a network of size 200 (b). This figure shows that 
the A^-dependence of the mean and standard deviation of the 
couplings in Fig. [5] arises from the scaling of individual cou- 
plings. 



values with N, but only the weights that are very close 
to each other cross. This observation suggests that the 
structure of the weights is preserved as N is increase, i.e. 
there is an approximate scaling behaviour for individual 
weights. 

How can we explain this scaling of the weights? Be- 
cause there is no spatial structure in the connectivity 
of our original simulated model, we do not expect to 
find any such structure in the functional connections Jij 
that we obtain. Thus, it seems possible that our in- 
ferred models will be "infinite-ranged." If so, we may de- 
scribe the statistics of the couplings by the infinite-range 




FIG. 7: The behaviour of the 10 largest and 10 smallest cou- 
plings in the set of 20 neurons, when more and more neurons 
are added. The blue (full) curves show how the 10 largest 
weights in the population of 20 neurons change their values as 
we ass more and more cells, and the red (dashed) curves show 
the same thing for the 10 smallest couplings. The weights are 
inferred using Boltzmann learning. 



in which one assumes independently distributed Jij with 
a mean Jq/N and a variance J'^/N. This model both 
has a normal and a spin glass phase. The normal phase 
may be characterized completely in terms of two order 
parameters, the mean magnetization Af = rrii 
and the mean square magnetization q = N^^ mf. In 
the spin glass phase, a much more complex description is 
required. Fortunately, as shown below, we will only need 
to consider the normal phase, where one can derive rel- 
atively simple relations that express the mean and vari- 
ance (across all pairs of spins) of the correlations in terms 
of the mean and variance of the couplings and the order 
parameters of the model. These relations can be inverted 
to give the mean and variance of the Js in terms of the 
mean and variance of the C's. The size of the system 
enters in these relations, so they make predictions about 
the A'^-dependence of the Js that we can compare with 
the results of the inference algorithms for different sizes 
of sets of neurons. 

Consider first the average (over off diagonal elements) 
of the correlation matrix, C = [N{N — ^ij- 
Standard mean-field arguments (e.g., averaging Eq. (|¥1)) 
lead to 



(16) 



The mean square of the correlation matrix elements can 
also be shown to be [3 (see also [13] sec. 3.2) 



(J2 



where 



Sherrington-Kirkpatrick (SK) model of a spin glass 



mge 

0, 



(17) 



(18) 
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FIG. 8: Comparing the A'^-dependence of the mean and stan- 
dard deviation of the solutions found from the first three good 
approximations (SM (black, long dashed), TAP (red, short 
dashed), SM-TAP hybrid (solid blue)), with the SK predic- 
tion (Magenta, dotted). The Boltzmann results for individual 
samples up to A'^ = 200 are replotted from Fig. ((5)1 for com- 
parison (brown stars). 



Inverting Eqs. and (fT7|) to obtain the statistics of 
the Js in terms of those of the Cs, we find 



N {l-q){l-q + NC) 

var J = — = =^ 

N S{S + NC^) 



(19) 
(20) 



These results hold for N equal to the full system size. 
However, in the inference process described here, one 
works with data from a smaller number of neurons and 
tries to model them by a network of that smaller size. 
Thus, the inferred Js will have (larger) means and vari- 
ances than their true ones because the N in the denomi- 
nators of Eqs. (fT5|) and (PO)) will be smaller than the true 



size. (Note that the statistics of the measured Cs do not, 
on average, depend on the size of the set of neurons.) 

The criterion for the stability of the normal phase [TB| 
is J'^S > 1. Using our extracted values of J and calcu- 
lating 5* from Eq.[T8l we find that J'^S grows with N but 
never exceeds 0.65. Therefore the assumption that the 
system is in the normal phase is self-consistent. 

To the extent that our inferred network is like an SK 
model, Eqs. p9p and (PD|) should describe the way the 
mean and variance of the Js very with N. This is easy 
to test because the statistics of the C"s and the moments 
of the rrii that occur in q and S are readily calculated 
from the spike data. Fig. [S] shows how well the results of 
the inference algorithms conform to this simple behavior. 
In this figure, the means computed from the SM, TAP, 
and hybrid SM-TAP approximations, averaged over 20 
random samples of the excitatory neurons for N < 300 
and 10 samples for N > 300, are shown, together with 
the results of Boltzmann learning on individual samples 
and the mean-field spin glass predictions. The quantities 
S, q, C and that appear in Eqs. ([20|) and (fT9|l are com- 
puted from the whole population of excitatory neurons. 

The agreement of Eqs. (fT9)) and ((20)) with the results of 
our parameter extraction methods is not perfect, but the 
magnitude of the standard deviation of the Jy 's and its 
falloff with N are captured reasonably well. The mean 
agrees well with the TAP result, but of course the TAP 
Jij 's are systematically higher than the true (Boltzmann) 
ones, as described above. 



TONIC VERSUS STIMULUS-DRIVEN 
FIRING STATES 



In the previous sections, we studied tonic- firing data 
only. When we conducted the same analyses for the 
stimulus-driven data, all the same conclusions about the 
quality of different approximations and their size depen- 
dencies were drawn. The only difference that we observed 
is slightly higher mean coupling found in the stimulus- 
driven case. The similarity between the weights can be 
seen by comparing Fig. [5] and Fig. [S^. The increase in 
the couplings of the weights is too small to see here. It 
is also too small to show up in a scatter plot. 

The shifts in the mean coupling computed by Boltz- 
mann learning and the good approximation methods 
(SM, TAP and their average) are shown in Fig.[5]3. While 
only the SM-TAP average gets the mean coupling right, 
all three methods capture the stimulus-induced shift. 

The higher mean couplings can be understood qualita- 
tively using Eq.[Tni In the stimulus driven case, the mean 
correlation is slightly higher than the tonic case (the dif- 
ference is what is generally called "stimulus-induced cor- 
relations"). For the data used for Fig.O), Cgtim = 0.0052 
versus Ctonic ~ 0.0023, leading in Eq. to a larger J. 
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defined as 



DkL (Ptrue I bising) = ^ Ptruo (s) In | P*''"°(^M 



(21) 



In addition, we considered the KL divergence between the 
true distribution and an independent-neuron distribution 



Pind oc exp 



(22) 



in which the external fields /ij"'^ arc chosen so that Eqs. 
[2a] are satisfied. We also define dind = £'KL(ptrue| bind)- 
Denoting the averages of dising and dind over many sam- 
ples of a given size N by dising and d-md, we can define 



(b) 



G = 1 



^Ising 



(23) 
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FIG. 9: The A'^-dependence of the mean couphngs in the 
stimulus-driven case as found from Boltzmann learning on 
single samples (brown squares), SM (black long dashed lines), 
TAP (red short dashed hues), SM-TAP hybrid (blue full 
curve), and the prediction of Eq. (|19p (magenta dotted line). 
The thick curves are the results for the stimulus-driven case, 
and the results of the tonic case are plotted using thin curves. 



VI. THE TRUE DISTRIBUTION VERSUS THE 
MODEL DISTRIBUTION 



In this section, we study the second question about 
pairwise models raised in the introduction, i.e. the qual- 
ity of the model in approximating the true distribu- 
tion of spikes. To compare the fitted Ising distribution, 
Pising- with the true distribution, ptpisLJ we considered the 
KuUback-Leibler (KL) divergence [131 between the two. 



as a measure of the goodness of pairwise models 0, H, 
H, Hi • (Other studies [1, Q have used the measure A = 
dising/dind = 1 " G.) It is easy to show that 



c'ind — Sind — Strue (24a) 
C^Ising — »5lsing 'S'truej (24b) 



and consequently 



G 



Si 



ind 



Sis 



Sir 



Stil 



(25) 



where Rising, ^ind and S'truc arc the entropies of piling, Pind 
and ptrue J and the over line indicated averaging over many 
samples of the same size. The quantity G is the fraction 
of the entropy difference between the independent model 
and the data that is explained by the pairwise model. 
When G is near one, the pairwise models is very good 
(compared to the independent model) in terms of the 
amount of the true entropy that it explains. When G = 0, 
the pairwise model is just as bad as the independent- 
neuron model. 

In Fig. [TOl we show the behavior of c?ind, Rising and 
G versus N. To produce this figure, we first chose a 
population of 15 neurons. For each N, we then chose 
(]^) or 2500 random populations of N neurons from 
the original 15 cells, whichever was larger. For each of 
these populations, we computed the entropy using T time 
bins from the simulation by simply counting the number 
of occurrences of each pattern. Wc also computed the 
means and correlations from these T time bins and fit 
an independent-neuron and an Ising model to the data. 
Fitting of the parameters of the Ising model was done 
by numerically exact minimization of the error function 
Z]sPtruo(s)log[ptruG(s)/pising(s)], using Conjugate gradi- 
ent descent. We then computed the entropies of the in- 
dependent and Ising models using brute force summation 
over all the states. We did this procedure using T = 10^, 
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T = 1.5 X 10*5 and T = 1.8 x 10^. The resulting val- 
ues of dind, falsing and G for each of these values of T 
is shown in Fig. [TOl As expected, finite T leads to an 
underestimation of S'truo and thus overestimation of djnd 
and c?ising- To correct for finite sampling bias, the re- 
sulting values of o?ind and dising for each T were fit by a 
second order polynomial in 1/T, and the limit T —^ oo 
was taken [l9[. The unbiased estimates are shown in 
black in Fig. [TOl Both di^d and dising increase with N, 
while G decreases. For the population shown here we had 
^"^Ei(si)data ~ "0.8, indicating that iV^ « 10. This 
figure shows that for small populations G is close to 1, 
but it decreases linearly even for values of N above Nc- 



VII. DISCUSSION 

Even with their shortcomings, models of the type we 
have studied here provide a potentially attractive frame- 
work for analyzing multi-neuron spike data. We imagine 
that experimentalists would want a quick and easy way 
to find out what J^s characterize the spike data they 
have measured. In previous work, the extraction of the 
JijS was done by brute force, using Boltzmann learn- 
ing. This is in principle exact but very slow for large 
N. The fast, approximate parameter-extraction meth- 
ods described here offer a way to make Ising pair models 
a practical data-analysis tool. 

We calibrated these fast methods by comparing their 
results with that of Boltzmann learning for sets of neu- 
rons up to a size N = 200. which took several days for a 
single run. We were able in this way to evaluate and com- 
pare several methods: (1) the independent-pair approxi- 
mation, which is known p to be correct in the low-firing 
rate, small- iV limit. (2) inversion of the mean-field equa- 
tions for the correlations, (3, SM) a combination of (1) 
and (2) proposed by Sessak and Monasson [l3|, and (4, 
TAP) inversion of the Thouless- Anderson-Palmer equa- 
tions. 

Of these approximations, the third and fourth turned 
out to work very well, with SM slightly better that TAP. 
SM has a slight tendency to underestimate the Jy S and 
TAP has a slight tendency to overestimate them. We 
found that an ad hoc averaging of the SM and TAP Jij 's 
agreed even better with the Boltzmann learning results, 
with an rms error of about half that achieved by SM 
or TAP. Of course, this result has to be taken as just 
a bit of good luck for the particular network used to 
generate these data; there is no reason to expect it to 
hold generally. 

We could then proceed to apply the good fast approx- 
imation schemes for N > 200 and to identify the im- 
portant generic features of the extracted couplings. We 
found that for larger the Jij's had a nearly zero mean 
value and that their absolute values appeared to shrink 
systematically as N increased. Furthermore, the Jij's 
found to be strongest (i.e., to have the largest absolute 
values) for one set of neurons were also generally found 
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FIG. 10: The behavior of the KL distances and G versus the 
set size, A'^. (a) dind, the KL distance between the indepen- 
dent and the true distributions versus N, (b) dising, the KL 
distance between the Ising and the true distribution versus 
N, (c) the goodness of fit, G, versus A^. Magenta circles, 
red crosses and blue squares represent estimations of these 
quantities from T = 10^ T = 1.5 x 10*' and T = 1.8 x lO'' 
samples respectively, while black stars show the bias-corrected 
(T — > oo) estimates. 
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to be the strongest within that set when the extraction 
was done for a larger set of neurons. Thus, the strong 
Jij's appeared to be quite robust statistics. 

A measure of the typical magnitudes of the Jy 's is pro- 
vided by their standard deviation. Although the fit is not 
perfect, the decrease in the standard deviation with N is 
captured crudely by a simple theoretical picture in which 
one assumes that the 's are chosen randomly and inde- 
pendently. In other words, as far as its pair correlations 
are concerned, the network behaves approximately like a 
Sherrington-Kirkpatrick (infinite-range) spin glass. This 
finding is not too surprising, in view of the fact that the 
network used to generate the data had completely ran- 
dom connectivity. Perhaps it is more surprising that the 
data deviate systematically from the spin-glass predic- 
tion. We do not have any explanation of these deviations. 

We analyzed spike data for both tonic-firing and 
stimulus-driven conditions. In the latter, the input to the 
simulated cortical network was varied temporally. The 
fact that all neurons received input with the same tem- 
poral modulation might be expected to generate extra 
correlations, which would be reflected in increased J^ 's. 
In fact, this did happen, but the effect was very small. 
The values of the 's found in the two conditions were 
nearly the same. Thus, the couplings obtained (in par- 
ticular, the ways they vary from one pair or neurons to 
another) are intrinsic properties of the system. The only 
systematic effect of the stimulus was a small increase in 
the average Jij. This weakness of this effect is perhaps to 
be expected, because the temporal modulation employed 
was rather slow (a time constant of 100 ms) compared to 
response times in the network (10 ms or less). It would 
of course be of interest to study the effect of varying the 
time constant of the input rate fluctuations. 

A natural question to ask is whether the Jij 's one finds 
have any relation to the synaptic connections in the net- 
work that generated the data from which they are ex- 
tracted. In our case, we know that synaptic connectivity, 
so we can answer this question. Somewhat disappoint- 
ingly, however, we have found no significant relation be- 
tween the Jij's and the synapses. We believe this is be- 
cause we are trying to force a description with symmetric 
couplings (Jij = Jji) onto a network where this symme- 
try is absent or nearly so. We think another approach 
would be required to uncover the underlying synaptic 
connectivity, one based on correlation between spikes by 
different neurons in different time bins, rather than, as 
here, coincidences in the same time bin. Then one might 
be in a position to identify which neurons' spikes tend 
to cause spikes in which other neurons, which is more 
closely related to synaptic connectivity. 

Of course, even within the present paradigm, pair mod- 
els are not guaranteed to be exact. For our data and for 
small subsets of the neurons {N < 15), we were able to 
quantify the degree of mismatch in terms of the KuUback- 
Leibler distance between the true distribution and the 
Ising model Gibbs distribution. In agreement with ear- 
lier results we found that pairwise Ising models 



perfectly model the true distribution in the limit of small 
A^. We also found that the quality-of-model measure, G 
(Eq. [23]), decreases linearly with A'' for the range that we 
tested. For A^ ^ Nc, this decrease can be understood us- 
ing the expansion in NVSt of Roudi Q . To lowest or- 
der one has d\nd oc (NvSt)'^ and dising oc (NvSt)^; conse- 
quently G oc 1 — NvSt. However, this expansion is bound 
to break down, as it will eventually predict a true entropy 
that decreases with A^. This can be seen by noting that 
S'ind A^ and therefore, S'truo = •S'ind — djnd = CiN — C2N^ 
will be a decreasing function of A^ for A^ > Ci/(2c2). Nev- 
ertheless, G can still be a decreasing function of A^ even 
when the expansion breaks down, and indeed we see in 
Fig. [To] that this is the case for our data. The decrease 
in our data is of the order of 5% for A^ = 10, suggest- 
ing that one should be cautious in applying pair models 
for A^ bigger than about 50 or so (where the pair model 
only explains about 75% of the entropy difference be- 
tween the independent-neuron model and the data). If 
we naively extrapolate the linear dependence of G on 
N, we find G w for A^ « 200, indicating that at this 
size a pairwise model would be no improvement on an 
independent-neuron model. 

Nevertheless, even when they arc not good models 
(in the sense that the KuUback-Leiblcr distance between 
them and the true distribution is not small), pairwise 
models offer a conceptually simple and useful framework 
for characterizing measured multi-neuron spike statistics. 
One the one hand, by construction, they describe the 
first- and second-order statistics correctly, and on the 
other hand they are the only models for which it is prac- 
tically feasible to carry out the fit using data sets of 
realistic size. When used with caution, they can pro- 
vide robust and reliable information about the correla- 
tion structure in the data, and the fast approximations 
we have described here should be useful in applying them 
in practical data analysis. 



APPENDIX A: THE SIMULATED MODEL 
CORTICAL NETWORK 

The means and correlation functions we use in this 
paper are obtained from simulating a network consist- 
ing of 1000 Hodgkin-Huxley-like model neurons with 
conductance-based synapses, 800 excitatory and 200 in- 
hibitory. The network was driven by an external popu- 
lation of 800 excitatory neurons. The connectivity was 
random, with the probability of a synapse between any 
two neurons equal to 10%. There was no synaptic ran- 
domness beyond that implied by the random connectiv- 
ity: When a synapse was present, it had a maximum 
conductance which depended only on which populations 
the pre- and postsynaptic neurons belonged to. The dy- 
namics of the membrane potential. Via , of a neuron i in 
population a = E (excitatory) or / (inhibitory) is given 
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by 

- T.G7aiAm^^-^r). (Ai) 

where = and = -80 mV, G^/"*"' is the intrin- 
sic conductance of type a = Leak, Na, or K of neuron i in 
population a, and G^^" ^ (<) is the conductance associated 
to the synapse from neuron j in population h to neuron 
i in population a. The intrinsic conductances are of the 
standard Hodgkin-Huxley form C^.'"''' = g^^mP^" h^" , with 

PNa = 3, gNa = 1, PK = 4, = 0, PLoak = ^Leak = 0. 

The gating variables m„ and h„ obey standard kinetics 
with voltage-dependent opening and closing rates. The 
forms of these rates, as well as the values of the 3°, were 
taken from and Pare [lO] • 

The synaptic conductances G^^^,- (t) were obtained by 
filtering the presynaptic spike trains through a sequence 
of three exponential filters. The time constants of these 
filters, representing the synaptic delay, the rise time and 
the fall time of the synaptic conductance after a presy- 



naptic spike, were chosen randomly from uniform distri- 
butions of means 1, 3, and 5 ms, and half- widths equal 
to 90% of the means, respectively. 

In the tonic state the firing rate of the external pop- 
ulation was constant, leading to constant firing rates for 
the neurons in the network. Because of the randomness 
in the network structure, these rates varied from neuron 
to neuron. The maximum synaptic conductances were 
chosen so that the total average synaptic conductance 
was in the range found by Destexhe et al and so that 
the inhibitory neurons fired on average at about twice 
the average rate of the excitatory ones. In the stimulus- 
driven state, the rate of the external population was mod- 
ulated randomly in time, in order to mimic qualitatively 
the experiments of Schneidman ei aZ 0] , where movies of 
dynamic natural scenes were projected onto salamander 
retinas. Specifically, we took the external rate to be a 
constant plus exponentially filtered white noise, with a 
time constant of 100 ms. As a result, the firing rates of 
the neurons in the network also varied in time. The noise 
parameters were chosen so that the averages of the firing 
rates over intervals much longer than the time constant 
were approximately the same as those in the tonic state. 
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